12 research outputs found
Tasks Fairness Scheduler for GPU
Nowadays GPU clusters are available in almost every data processing center. Their GPUs are typically shared by different applications that might have different processing needs and/or different levels of priority. As current GPUs do not support hardware-based preemption mechanisms, it is not possible to ensure the required Quality of Service (QoS) when application kernels are offloaded to devices.
In this work, we present an efficient software preemption mechanism with low overhead that evicts and relaunches GPU kernels to provide support to different preemptive scheduling policies. We also propose a new fairness-based scheduler named Fair and Responsive Scheduler, (FRS), that takes into account the current value of the kernels slowdown to both select the new kernel to be launched and establish the time interval it is going to run (quantum).Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
A Framework For TV Logos Learning Using Linear Inverse Diffusion Filters For Noise Removal
Different logotypes represent significant cues for video annotations. A combination of temporal and spatial segmentation methods can be used for logo extraction from various video contents. To achieve this segmentation, pixels with low variation of intensity over time are detected. Static backgrounds can become spurious parts of these logos. This paper offers a new way to use several segmentations of logos to learn new logo models from which noise has been removed. First, we group segmented logos of similar appearances into different clusters. Then, a model is learned for each cluster that has a minimum number of members. This is done by applying a linear inverse diffusion filter to all logos in each cluster. Our experiments demonstrate that this filter removes most of the noise that was added to the logo during segmentation and it successfully copes with misclassified logos that have been wrongly added to a cluster
Evaluation of CNN architectures for gait recognition based on optical flow maps
This work targets people identification in video based on the way they walk (\ie gait) by using deep learning architectures. We explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (\ie optical flow components). The low number of training samples for each subject and the use of a test set containing subjects different from the training ones makes the search of a good CNN architecture a challenging task.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tec
CUVLE: Variable-Length Encoding on CUDA
Data compression is the process of representing
information in a compact form, in order to reduce the storage
requirements and, hence, communication bandwidth. It has been
one of the critical enabling technologies for the ongoing digital
multimedia revolution for decades. In the variable-length
encoding (VLE) compression method, most frequently occurring
symbols are replaced by codes with shorter lengths. As it is a
common strategy in many compression applications, efficient
parallel implementations of VLE are very desirable. In this paper
we present CUVLE, a GPU implementation of VLE on CUDA.
Our approach is on average more than 20 and 2 times faster than
the corresponding CPU serial implementation and the only
known state-of-the-art GPU implementation, respectively.Junta de AndalucĂa, TIC-1692. Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tec
A Hybrid Piece-Wise Slowdown Model for Concurrent Kernel Execution on GPU
Current execution of kernels on GPUs allows improving the use of hardware resources and reducing the execution time of co-executed kernels. In addition, efficient kernel-oriented scheduling policies pursuing criteria based on fairness or Quality of Service can be implemented. However, achieved co-executing performance strongly depends on how GPU resources are partitioned between kernels. Thus, precise slowdown models that predict accurate co-execution performance must be used to fulfill scheduling policy requirements.
Most recent slowdown models work with Spatial Multitask (SMT) partitioning, where Stream Multiprocessors (SMs) are distributed among tasks. In this work, we show that Simultaneous Multikernel (SMK) partitioning, where kernels share the SMs, obtains better performance. However, kernel interference in SMK occurs not only in global memory, as in the SMT case, but also within the SM, leading to high prediction errors. Here, we propose a modification of a previous state-of-the-art slowdown model to reduce median prediction error from 27.92% to 9.50%. Moreover, this new slowdown model is used to implement a scheduling policy that improves fairness by 1.41x on average compared to even partitioning, whereas previous models reach only 1.21x on average.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
P18-FR-3130
UMA20-FEDERJA-059
PID2019-105396RB-I0
Gait recognition and fall detection with inertial sensors
In contrast to visual information that is recorded by cameras placed somewhere, inertial information can be obtained from mobile phones that are commonly used in daily life. We present in this talk a general deep learning approach for gait and soft biometrics (age and gender) recognition. Moreover, we also study the use of gait information to detect actions during walking, specifically, fall detection. We perform a thorough experimental evaluation of the proposed approach on different datasets: OU-ISIR Biometric Database, DFNAPAS, SisFall, UniMiB-SHAR and ASLH. The experimental results show that inertial information can be used for gait recognition and fall detection with state-of-the-art results.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
A weakly-supervised approach for discovering common objects in airport video surveillance footage
Object detection in video is a relevant task in computer vision. Standard and current detectors are typically trained in a strongly supervised way, what requires a huge amount of labelled data. In contrast, in this paper we focus on object discovery in video sequences by using sets of unlabelled data. Thus, we present an approach based on the use of two region proposal algorithms (a pretrained Region Proposal Network and an Optical Flow Proposal) to produce regions of interest that will be grouped using a clustering algorithm. Therefore, our system does not require the collaboration of a human except for assigning human understandable labels to the discovered clusters. We evaluate our approach in a set of videos recorded at the outdoor area of an airport where the aeroplanes park to load passengers and luggage (apron area).
Our experimental results suggest that the use of an unsupervised approach is valid for automatic object discovery in video sequences, obtaining a CorLoc of 86.8 and a mAP of 0.374 compared to a CorLoc of 70.4 and mAP of 0.683 achieved by a supervised Faster R-CNN trained and tested on the same dataset.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Efficient OpenCL-based concurrent tasks offloading on accelerators
Current heterogeneous platforms with CPUs and accelerators have the ability to launch several independent tasks simultaneously, in order to exploit concurrency among them. These tasks typically consist of data transfer commands and kernel computation commands. In this paper we develop a runtime approach to optimize the concurrency between data transfers and kernel computation commands in a multithreaded scenario where each CPU thread offloads tasks to the accelerator. It deploys a heuristic based on a temporal execution model for concurrent tasks. It is able to establish a near-optimal task execution order that significantly reduces the total execution time, including data transfers. Our approach has been evaluated employing five different benchmarks composed of dominant kernel and dominant transfer real tasks. In these experiments our heuristic achieves speedups up to 1.5x in AMD R9 and NVIDIA K20c accelerators and 1.3x in an Intel Xeon Phi (KNC) device.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Gait recognition applying Incremental learning
when new knowledge needs to be included in a classifier, the model is retrained from scratch using a huge training set that contains all available information of both old and new knowledge. However, in this talk, we present a way to include new information in a previously trained model without training from scratch and using a small subset of old data. We perform a thorough experimental evaluation of the proposed approach on two image classification datasets: CIFAR-100 and ImageNet. The experiment results show that it is possible to include new knowledge in a model without forgetting the previous one, although, the performance is still lower than training from scratch with the complete training set.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
Low-textured regions detection for improving stereoscopy algorithms
The main goal of stereoscopy algorithms is the calculation of the disparity map between two frames corresponding to the same scene, and captured simultaneously by two different cameras. The different position (disparity) where common scene points are projected in both camera sensors can be used to calculate the depth of the scene point. Many algorithms calculate the disparity of corresponding points in both frames relying on the existence of similar textured areas around the pixels to be analyzed. Unfortunately, real images present large areas with low texture, which hinder the calculation of the disparity map. In this paper we present a method that employs a set of local textures to build a classifier that is able to select reliable pixels where the disparity can be accurately calculated, improving the precision of the scene map obtained by the stereoscopic technique.Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech. Ministry of Education and Science of Spain under contract TIN2010-16144 and Junta de AndalucĂa under contract TIC-1692